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Abstract 



We study the problem of pattern matching in order-sorted languages whose evaluation strategy 
is lazy. We propose an extension of the Puel-Suarez compilation scheme to function definitions 
via order-sorted patterns. Basically, a list of ordered and possibly ambiguous linear patterns 
is transformed into a set of disjoint order-sorted constrained terms. This set is in turn 
transformed according to some normalization rules in order to build a pattern matching tree 
(PMT). Variables of order-sorted constrained terms now have not only structure, but also 
subsort constraints. Accordingly, discrimination trees are defined to have edges labeled with 
either structure or subsort constraints. Due to this latter kind of edge, we are not always forced 
to reduce terms to normal forms during the pattern matching process, taking advantage in this 
way of the lazy reduction scheme. For example, suppose a is a sort greater than rj, the variable 
x 11 is a pattern and f is a term of sort a to be matched. If f reduces to a term whose sort is 
a subsort of 77, it is already decidable that the term obtained matches xP, even if it is not in 
normal form. We show that the PMT is optimal if a decidable property of sequentiality holds 
for the sets generated during the compilation process. Our method turns out to be applicable 
for strict languages as well. 



Resume 



Nous etudions le probleme du filtrage dans des langages avec sous-sortes et dont la strategic 
devaluation est paresseuse. Nous proposons une extension d'un schema du a Puel et Suarez 
pour la compilation des definitions de fonctions basees sur des motifs a sortes partiellement 
ordonnees. En resume, une sequence ordonnee de motifs lineaires potentiellement ambigus est 
transformee en un ensemble de termes contraints a sortes ordonnees qui sont mutuellement 
exclusifs. Cet ensemble est ensuite transforme a l'aide de regies de normalisation permettant 
de construire un arbre de filtrage. Les variables des termes contraints a sortes ordonnees sont 
ici soumises a des contraintes non seulement de structure, mais aussi de sous-sortes. Pour 
refleter cela, un arbre de filtrage est dote d' aretes etiquetees par des contraintes de structure 
ou de sous-sorte. Grace a ce dernier genre d'aretes, il n'est pas toujours necessaire de reduire 
les termes en forme normale pendant le processus de filtrage, et done de beneficier de cette 
maniere de la reduction paresseuse. Par exemple, supposons que a soit une sorte superieure a 
77, que la variable x v soit un motif et que f ', un terme de sorte a, soit a filtrer. Des que f est 
reduit a un terme de sorte inferieure ou egale a 77, il est d'ores et deja decidable que le terme 
ainsi obtenu est filtre par x 71 , meme s'il n'est pas en forme normale. Nous montrons que 1' arbre 
de filtrage est optimal si une propriete decidable de sequentialite est verifiee par les ensembles 
engendres durant la compilation. Notre methode s'avere egalement applicable aux langages 
stricts. 
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1 Introduction 

Many programming languages use pattern matching in a many-sorted term algebra (such as 
those in the ML family [6]) or an order-sorted term algebra (such as those in the OBJ family [5]) 
for function argument-passing. Function definitions consist of an ordered set of rewrite rules. 
These rules are often ambiguous as some left-hand sides (LHS) of the same function definition 
may overlap. Thus direct access to the relevant rule based on the LHS's structure is not possible 
in general. The naive operational semantics, amounting to sequential lookup until a calling 
term matches a rule's LHS, obviously leads to poor performance. In addition, since a lazy 
evaluation strategy allows the manipulation of infinite objects (i.e. with no finite constructor 
normal forms), it is not clear what pattern matching means for lazy languages. For example, 
for a term to match a LHS term, the reduction scheme should be such that the only part of 
the term to be evaluated is the one required, in some sense. Recently, Puel and Suarez [10] 
devised a clever compilation scheme to generate statically a PMT in lazy languages. Such a 
tree is then used at run-time for fast rule-indexing and takes full advantage of the nature of 
the LHS terms in a definition. Their work simplified and generalized seminal ideas by Huet 
and Levy [7] that were in turn sharpened by Laville [9]. The gist of the Puel-Suarez method 
rests on generalized notions of constructor terms and sequentiality. They called the new terms 
constrained terms. 

Although partially ordered sorts provide a substantially improved expressiveness over many- 
sorted languages, in an order-sorted system with a lazy reduction strategy, pattern matching 
is more complex than with non-ordered sorts in that it necessitates two kinds of verifications. 
The first one, as in the conventional case, is structure matching. The other one is to ascertain 
that the argument's sort is a subsort of the formal parameter's sort. Moreover, as functions can 
have a non-strict semantics, they can yield a result even for some arguments whose evaluation 
is non-terminating. Therefore, the arguments need only be evaluated just enough so as to make 
either a structure or subsort verification decidable. However, it is not clear how many steps 
of reduction must be performed on a given term in order for its sort to become sufficiently 
precise. 

Consider for example the classical subsort order for the integer numbers: where o is a constant 



int 




o :— > zero 

pred : zerojieg — > neg 
sue : zeroq>os — > pos 



pos zero neg 



of sort zero, the symbol pred is a constructor of sort neg and the symbol succ is a constructor 
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of sort pos. 

Let C be the characteristic function of natural numbers defined by the following rewrite rules. 

C(x zero ^ os ) = 1 
C(x zem - neg ) = 0 

Let the order of appearance of these rules be significant as specified. That is, the second rule 
is to be considered only if the first one is not applicable. Thus, this definition is equivalent to: 

C(x zem ^ os ) = 1 

C(x ne z) = 0 

Now, suppose a term of sort int with non terminating evaluation given by the following 
sequence of reductions t int - tf 0 ^ - t z f°- pos - tf°^ os ... With a strict evaluation 
strategy, C(t mt ) is not defined because t mt has no denotation. Nevertheless, with a lazy 
reduction scheme, C(t mt ) is defined and equal to 1 since t' nt can be reduced in finitely many 
steps just so far as necessary to ascertain that it is of sort zerojios. Thus, there is no need 
always to reduce terms to normal forms during the pattern matching process and pattern 
matching becomes a non-trivial problem deserving careful attention. 

We restrict our interest to syntactic pattern matching. 1 Sorts are partially ordered. Minimal 
sorts are assumed to be pairwise disjoint and non-minimal sorts are assumed to be the union 
of their subsorts. Functions can have more than one declaration [4]. We will also restrict our 
attention to linear LHS terms, i.e., without repeated variables. We lose no expressive power, 
though we lose some notational convenience. 

The order of rules defining a function is significant because LHS terms can be ambiguous; that 
is, they can be unifiable. Since we are considering deterministic languages, a list of terms with 
priority (a pattern) must be constructed. Thus, in order to avoid a more complicated syntax 
and a burden to the programmer, a disambiguating meta-rule will be necessary to construct such 
a list. Usually, this is according to the appearance of the rules in the text but any other priority 
will suffice [9]. This must naturally be taken into account when constructing the PMT We 
propose here an extension of the Puel-Suarez compilation scheme that accommodates order- 
sorted constructor-based function definitions. Our compilation method eliminates ambiguous 
patterns by introducing order-sorted constrained terms. Moreover, as order-sorted pattern 
matching consists of two kind of verifications, discrimination trees are now quite complex 
since some edges are now labeled with sort restrictions. 

Given a pattern matching problem S, the strict set of 5 is the set of terms for which every 
PMT associated to S will fail to terminate and an optimal PMT is a PMT that will only fail to 
terminate on the strict set of 5. We show that optimality of an order-sorted PMT is a decidable 
property equivalent to a generalization of the notions of strong sequentiality presented in [7] 
and [10]. Sequentiality of a pattern matching problem S is the possibility of systematically 
expanding any term step by step until either it matches a pattern of S or it is clear that a positive 

'See [8] for a discussion of unification and matching in equational theories. 
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matching is impossible. Our notion of sequentiality takes not only the structure of terms into 
account, but also the sort system. We present a more general treatment of pattern matching 
compilation in which the unsorted and many-sorted languages are particular cases. 

The paper is organized as follows. Section 2 presents unitary signatures and function 
definitions by order-sorted equations. Section 3 defines the syntax and semantics of order- 
sorted constrained terms and patterns. Section 4 describes our compilation method. This 
consists of three kinds of rules acting on constrained terms. Invariance and completeness of 
these rules are given. Finally, in Section 5, the new notion of sequentiality for order-sorted 
constrained terms is presented. We show that sequentiality and optimality of pattern-matching 
problems are equivalent. A brief description of order-sorted type systems covered by our work 
can be found in Section 6. 

2 Functions Defined by Order-Sorted Patterns 

All the conventional notions regarding substitutions, instantiation, and unification of unsorted 
terms are readily extended to order-sorted terms [14]. 

A signature E = (S, <, J 7 , C, V,V) consists of a set of sort symbols S = {a, 77, 8, . . .}, 
a partial order < on S, a set of function symbols T = {F, G, H, . . .}, a set of constructor 
symbols C = {/, g, h, . . .}, a set of 5-indexed variables V = {x° ,y (T ,z (T , ■ ■ ■} with a countably 
infinite number of variables for each sort symbol <r, and a set of declarations V of the form 
q : g\ . . . <t„ — > (j where q G T U C. We will call a\ . . . a n the domain off and a its codomain. 
The sets S, T , C and V are mutually disjoint. For brevity, we will write s G E for any symbol 
s in S, T , C, V or V. We use a, 77, ... to denote possibly empty sequences of sorts. The order 
< is extended componentwise to sequences of the same length in S* and is also denoted <. 

I7-terms are constructed in the usual manner with the additional constraint that they be well- 
sorted. Formally, a variable x"" 6 E is a well-sorted I7-term of sort 77 if a < rj, and q(ti ...t n ) 
is a well-sorted I7-term of sort 77 if and only if there exists a declaration q : g\ . . . <r„ — > a 6 E 
such that (j < 77 and for all 1 < i < n, tj is a well-sorted I7-term of sort <r,. Where E is 
understood, we will refer simply to terms instead of I7-terms. 

A signature E is called regular if all terms have a least sort. We may emphasize the fact that 
a term t has least sort a by writing it as f . A signature E is called unitary if it is regular and: 

" <) is a boolean lattice with least upper bound operation U, greatest lower bound 
operation n, greatest element T and least element _L. 2 

- No function or constructor declaration contains the sort symbol _L. 

- (Minimal codomain sort) Iff G C, then there exists a declaration/ : a — > a e E and 
(j is a minimal sort (i.e., if 8 < a then 8 = _L or 8 = <r). 

- (Disjoint domain sort) Iff G C and/ : <t\ . . . a n — > a G E and/ : 771 . . . 77 m — > 77 G E 

are two different declarations off in E, then n^m or n = m> 1 and a and 77 are 

2 A lattice (5, <) is said to be boolean iff Vcr £5,3 ! a c £ 5 such that a n a c = _L and a U a c = T. 
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disjoint (i.e., 3i, 1 < i < n, <r,- n 77,- = _L). 

In the sequel, we assume all signatures to be unitary. Motivations for considering such 
signatures are discussed in Section 6. 

Constructor terms are those terms that do not contain function symbols. A term is called 
linear if its variables occur at most once and ground if it contains no variables. 

Pattern matching is a prefix ordering, C, induced by instantiation on constructor terms 
modulo variable renaming. Formally, t C t' iff t' = 9{t), where 9 is a substitution. We say 
that t' matches t. Note that when only linear terms are on the left-hand side, then x a C f if 
77 < (j, and/(?i . . . t n ) ^f{h\ ■ ■ ■ h n ) if and only if for all (l < i < n) t; C h,. 

Unification is a least upper bound operation for C. We will note the least upper bound of 
two I7-terms t and t' as t\jf. Two terms are said to overlap, or to be ambiguous, if they are 
unifiable. Regular signatures are finitary unifying and they make order-sorted term unification 
well-behaved (see [14] for a discussion). If in addition the signature is unitary, then a unique 
unifier is produced. 

A function definition is specified by a set of rewrite rules {F{tt) = Pi}f =x , where F 6 T , the 
?;'s are (possibly mutually ambiguous) linear constructor terms (the patterns of F) and each pi 
is a term containing no variables not in f,-. A program V is a set of function definitions. 

3 Order-Sorted Constrained Terms 
3.1 Syntax 

For a signature S, we define the syntax and semantics of constructor I7-terms, I7-constraints, 
constrained I7-terms and I7-patterns. We will drop the prefix £ where it is understood. 

Let t be a term, I a linear term, a a sort and T and T the two logical constants denoting 
truth and falsehood, respectively. Then T, J 7 , t : a and t O I are atoms. A constraint is 
recursively defined as an atom or as C\ V Ci or as C\ A Ci where C\,Ci are constraints. 
When/ : <7i . . . cr n — > (7 G £ and x\ . . .x n are pairwise distinct variables, we may write an 
atom t O f(x° l . . . x° n n ) as tOf 7 or tOf if the sorts are clear from the context. We will write 
tO{fi, . . . ,f n } for a constraint tOf\ A ... A tOf n and A 6 C for an atom A of the constraint 
C. The intended interpretation of a sort constraint t : a is that it is decidable that term t has 
sort a, and the interpretation of a structure constraint t O I is that it is decidable that t is 
structurally different from I. 

If t is a constructor term and C a constraint, then t \ C is a constrained term. If t is linear 
(resp. ground), then t \ C is a linear (resp. ground) constrained term. A pattern is a non-empty 
list of linear constructor terms p\...p n . A constrained pattern is a non-empty list of linear 
constrained terms P\ .. .P n . 

For brevity, we will refer to either a term or a constraint as an object. The set of free variables 
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of an object h, denoted V(h), is denned as expected except that V(t O l) = V(t : a) = V(t). 
Substitution, denoted 9(h), is also denned as expected except that 6(t O /) = 6(t) O I and 
6(t : a) = 0(f) : a. 

We will also refer to the restriction of a constraint C to a set of variables V, denoted C\ y- For 
atomic constraints C, C\y = C if V(c) C V, otherwise C|y = T. For constraints formed 
from V and A, the restriction distributes through to their arguments. 

3.2 Semantics 

With a lazy reduction scheme, functions can yield a result even when applied to arguments 
whose evaluation is non-terminating. A new element is thus necessary to give semantics to 
such functions. We will introduce a new symbol for each sort a in the signature. Formally, 
an augmented signature £' is a unitary signature £ without variables and without function 
symbols {F, G,H,.. .} but with a 0-ary constructor for each sort symbol a 6 £ different 
from _L. denotes those terms of sort a that cannot be reduced to a term having a constructor 
symbol at the root (so-called head-normal form). Note that all £' -terms are ground. 

The free order-sorted term algebra on the signature £ is denoted by Tjj. An interpretation of 
a signature £ over its I7*-term algebra T^. satisfies: 

- a Ts ' := {s | s is a I7*-term of sort a} 

- ± Ts ' is the empty set and T Ts ' the universe of Ts» 

- a < rj implies a Ts ' C rj Ts ' 

- Iff is a constructor and/ : a\ . . . a n — > a 6 £, then/ 7 ^* is a function 
af s ' x ... X <Tn s ' -> a r ^ such that/ 7 ^ ( Sl . . . s„) =f(s 1 ...s n ) 

Sorts and subsorts sometimes allow us to decide if a term matches a pattern even if its 
evaluation is non-terminating. We show how to characterize such nice terms. 

We associate to each constructor term t three disjoint sets of I7*-terms such that the union of 
these three sets is the T^. -algebra. The first one, denoted by Mr or simply by and called 
the denotation or solution of t, is the set of £' -terms that are instances of t. The second one, 
denoted by ftJu and called the uncertain or strict set of t, is the set of all £' -terms for which 
we cannot decide if they are instances of t. The last, M;f, is the set of £' -terms that are not 
instances of t. Formally, 

- itj T = {6(t)\e is a (V, £') -assignment} 
■ is defined by recursion as: 

l^^u ={«' 7 |7?n ( 7^±and7?^ ( 7} 

U{h ■ ■ ■ tnYlu = {^Iv > o-} U \f( ai ...a n ) | 3/ at e lulu and Vjaj £ ltjl T } 

In both cases we can decide that does not match f when 77 and a are disjoint. In 
the first case, we can decide also that matches when 77 < a. Under the opposite 
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conditions, we cannot decide and then 6 Jx 0 " ■ In the second case, a is a minimal 
sort (we are dealing with unitary signatures) and so, every with rj comparable with 
a (i.e. rj > a) belongs to the strict set of f. Note also that we can decide that a term 
with the same constructor symbol does not match if at least one of its arguments does 
not match. 

- Mr = T E . - Mr - Mu 



Example 3.1 Consider the following subsort order: 



S 




b :— > rj a :— > £ 

W ,b)lr = {f{b, b) ,/(.* b) ,/(.", b) ,f(a, b),...} 

W(*r,b)lu = \f{b, •*),/(•*, •"),/(«, •"), ■ ■ ■} 
U{* a ,b)^ = {b, .*),/(«, a), . . .} 



Proposition 1 ItJr and ltj u are disjoint, Hx°l = a Ts ' , and P1U/2I = D>il n [fe]] 



To each constraint C, we associate a three-valued truth value — true, false or uncertain — 
under a (V(c), £') -assignment 0, denoted [[CUg. The important cases are defined as follows. 
The other cases follow standard three-valued logic with A being the minimum of its arguments 
and V the maximum (false < uncertain < true). 



true if 0(f) G PJjf 

It O lie = \ false if 0(f) 6 Mr 
uncertain if 0(f) £ ^IJu 



It : <r]]fl = < 



true 
false 

uncertain 



if 0(f) £ a T v 
if 0(f) is of sort 77 
and 77 n (j = _L 
otherwise 



Example 3.2 With the subsort order of Example 3.1 : 

[[x* : <tJ r JC * < = uncertain [x* : £]] ^ = true 

W(b, b) O aj e = true Ex* <>/(&, ft)]] ^ x s < — — uncertain 



To each constrained term f|C, we associate three disjoint sets of I7 # -terms. The first one, 
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denoted by p|C]r or simply by D>|C]] and called the denotation or solution of t\C, is the 
set of £' -terms that are instances of t and satisfy C. The second one, denoted |[f|C]]^ and 
called the uncertain or strict set, is the set of £' -terms for which we cannot decide if they are 
instances of t or if they satisfy C. The third one, denoted I?| CH^-, is the set of S' -terms that 
are not instances of t or do not satisfy C. Formally, 

. lt\C^ T = {0(0 | 0 is a (V(t), £') -assignment and EC| v(f )]] fl = true} 

- lt\Ciu = {0(0 | 0 is a (V(t), £") -assignment and [Civile = uncertain} U Iflu 

- lt\Cl T = T E . - lt\Clu - IACIt 

A constrained term is consistent if it denotes a nonempty set. If C\ V . . . V C„ is the disjunctive 
normal form of C, the denotation of t\ C is the union of the denotations of t \ C\ . . . t \ C„. 

Example 3.3 Let T = f(x a ,y a ) \ x a : rj A y a Oq(a,z p ). With the subsort order of 
Example 3.1: 



Proposition 2 The following equivalences will be used where required: 

- I*"! = lx°\x° : Vlifv > V 

- Mr = lx T \x T OtJ T 

- It | Ci V C 2 J = It | Cil U It | C 2 J and It | C x A C 2 J = It | C{\ PI p | C 2 ] 

- |[f I J 7 ]] = {}, [[f I TJ = [[fj a«<i, if t is a ground constructor term, then \t\ = {t} 

To each constrained pattern P\...P n (and thus in particular to each pattern), we associate three 
sets of I7*-terms. The solution or denotation of Pi . . . P„, denoted by [[Pi • • ■ P n \r or simply 
by DPi . . . P„J, is the set of I7 # -terms t for which there exists a P, such that t matches P, and 
it is decidable that t does not match P k ,k < i. The uncertain or strict set of a constrained 
pattern P\...P n , denoted [[Pi . . . P„^u, is the set of £' -terms t such that there exists a P, for 
which we cannot decide whether t matches P, and t is not in the denotation of any preceding 
prefix Pi . . . Pi, k < i of the pattern. The last one, denoted [[Pi . . . P n ^T, is the set of £' -terms 
that, decidably, are not solutions of Pi ... P„. Formally: 



inT = {f{b,b),f(^,b),...} 

Ulu = {f(»P,b),f(b, •"),.. .} 
ITlr = {f {a, q{a, •<>)), b, a,...} 



- IPi 

- IPi 

- IPi 



PJr 

Pnlu 



{t\3i(\<i<n)te IPilr and Vk (k < i) t 6 lP k i?} 
{t\3i(l<i<n)te IPilu and V£ (k <i)t(£lP 1 ... P k ^ r } 
T E . - EPi...P„]]r- lP\-..Pnlu 



Example 3.4 Consider the constrained pattern: 



P u P 2 =f(x?,b) \x? :£,/(v CT ,z CT ) |z CT :/3 A z°Oa 
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With the sub sort order of Example 3.1 : 

f{^,b) G EP 2 lr and g IP^, since f{^,b) G Pife. Hence, f{^,b) £ 

4 Compilation Rules 

In this section, we describe our compilation method. This consists of three kinds of rules acting 
on constrained terms. The simplification rules transform restrictions on terms into restrictions 
on variables. Partitioning transforms an ambiguous order-sorted pattern into an equivalent 
set (modulo simplification) of disjoint order-sorted constrained terms. The normalization 
rules transform a set of disjoint order-sorted constrained terms into a set of simpler ones that 
facilitates the construction of the pattern discrimination tree. 

4.1 Simplification Rules 

The simplification rules define a reduction relation — > s on constraints that transforms a 
structure or sort constraint on terms into an equivalent constraint on variables, that is, either T, 
T , or of the form x O t or x : <r. Figure 1 presents the simplification rules. Most are derived 
from [10, 3, 11] and are self-explanatory. The interesting ones are rules 14 and 15. 

The complete sort rule allows us to simplify several structural constraints to a single sort 
constraint. It states that a term does not match any of the constructors of a sort 8 if and only if it 
is not of sort 8. Note that this rule is only applicable when the constructors of 8 are finite. The 
negative sort rule states that any term t, which is of sort a but not of sort 77, is of sort a - rj, 
where a - rj = a n rj c . 3 We will write as a - {771, . . . , 77^} the sort (. . . (<r - 771) -...)- 77^. 

If xf : 77 appears in a constraint C, then the variable xf is said to be restricted by 77 in 
C, otherwise it is restricted by a. We shall say that a constraint C is in simplified form 
(irreducible by — > s ), denoted C [ s , if and only if it is either T or T or 

- If x"" : 77 is in C then a > 77 and 77 7^ _L 

- If x° O f is in C then t is not a variable and a > 77 

- If x O {fi, . . . ,/„} G C and fi : p; — > 8; G S (i = 1 . . . n), then/i, ...,/„ are not all the 
constructors of {8\ . . . 8 n } 

Example 4.1 The constraint x"" : 77 A y s Of(a, a) is in simplified form while x^Oz^ and 
y s O{p, q} are not. 



Theorem 1 (Simplification) Let t\C be a constrained term. 
3 Note that (a — rj) < a and (a — rj) n rj = _L. 
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Structures 

1. /(/!... t n y of(h 1 ...h n y 

>S Vl<i<B U O 

2. /(/!...?„) Og(Ai.A) — T 

3. /()o/()— ,:F 

4. f CT O A" — > s T 

if 77 n a = _L 

Conjunction and Disjunction 

5. tOlAtOl' — >, s fOZ 

if in V 

b.tOlV t Of(h) — >, t O l\Jf{h) 
if l\Jf(h) exists 

7 . x : (7 A x O f — > s x : a 

if (7 n 77 = _L 

8. x : (j A x : 77 — > s . x : <7 n 77 

9. x : (j V x : 77 — > s x : 77 U a 

if (j and 77 are comparable 



Positive Sorts 

10. f : _L — >, 

11. f : T — >, T 

12. r : 77 — , r 

if (j < 77 

13. f CT : 77 — >, f CT : <7 n 77 

if <7 and 77 are not comparable 

Negative Sorts 

14. r o y — >, r : o- - 77 

Complete Sort 

15. A?=i^oy;-— : <r-{^}f =1 

if/,- :p^8iEE and them's 
are all the constructors of {£, }" =1 



Figure 1 : Simplification Rules 
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(Invariance) If C — > s C then lt\C\ = lt\C] 
(Termination) There are no infinite chains C\ — > s Ci — > s ■ ■ ■ 
( Completeness) For each constrained term t\ C where C is not in simplified form, 
there exists a C such that C — > s C 

Proof: (Sketch) The invariance claim must be verified for each rule — a tedious task, since there are 
so many cases, but straightforward. Completeness can be easily shown by considering a constrained 
term whose constraint is not in simplified form. To prove the termination claim, we use the lexical 
ordering over (C\, Ci, C3, C4), where: 

- Ci = X^(rofc)GC s ^ ze ( t )' w h ere size(t) is defined as one would expect, 

- Ci is the number of structure atoms in C, 

- C3 is the number of sort atoms in C and, 

- C4 = X^(/ o-)ec P at h{cr), where path(a) is the length of the maximal path from _L to a in the 
sort lattice. 

Rules 1, 2, 3, 4 decrease C\\ rules 5, 6, 7, 14, 15 decrease Ci; rules 8, 9, 10, 11, 12 decrease C3; and 
rule 13 decreases C4. When one rule decreases C„ Ck{k < i) does not change. Thus, the complexity 
with respect to the lexical ordering is always reduced and the length of a — >, derivation is bounded. 

I 

4.2 Partitioning 

The definition of a pattern's denotation (Section 3.2) suggests splitting a pattern into an 
equivalent set of constrained terms, whose set denotations are disjoint, and whose union is the 
set denotation of the pattern. 

Let T\ = t\\C\ and Tj_ = t-^Ci be two constrained terms. We say Tj_ matches T\, denoted 
T\ E Ti, iff there exists a substitution 6 such that t\ C ti (i.e. ti = #(^i)) and Ci =^ 0(Ci), 
where =^ is logical implication. A substitution 6 unifies two constrained terms T\ and T2 if 
and only if 9 unifies t\ and ti and the constrained term | 0(C\ A Ci) is consistent. If 6 
unifies T\ and Ti, then T\UTi = h\At2\0{C\ A Ci) is the least upper bound with respect to C 
and we say that T\ and Tj_ are compatible constrained terms. 

Let T = t\C be a constrained term. The restriction of T under a substitution 6, denoted T\g, 
is defined to be t\C where C A t O 6(t) — >* C. 

Proposition 3 PiUft] = lh\dl n lt 2 \C 2 l and ie(t)j n IT\ 6 ] = 0 

The recursive function, V ART , takes a constrained term T and a pattern p\ ...p n as arguments, 
and partitions T according to p\...p n into a set of constrained terms whose denotations 
are disjoint. 4 To illustrate, suppose we wish to partition x T \T according to the pattern 
P\...p n . The first set generated is and we go on to recursively partition the decidable 

x T -complement of p\, that is, x T \x T Op\, according to the rest of the pattern p 2 . . .p n . Note 
that the order of the pattern is respected. 

4 In fact, we mean "partitions the set denotation of 7™ but we shall say simply "partitions T". 
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VART(T, []) = 0; 



VAKT(T, Pl ...p n ) = < 



VAKT(T,p2...p n ) 

if T and p\ \T are not unifiable; 

{ru {pi\T)} U PATZT(T\ e ,p 2 ...p„), 
otherwise; where 6 is the mgu of T and p\\T. 



Example 4.2 With the subsort order of Example 3.1, let g be a constructor whose domain 
sort is (<r U S) X not, where a U £ and «a£ are disjoint sorts. Partitioning x T | T according to 
the pattern g(x CT , 0) , g(y s , z nat ) , x T yields: 

{g(f,0), g{y s ,z nat )\y s : (/>, g{y s ,z nat )\z nat O0, x T \x T Og{x° ,0) A x T Og{y s ,z nat )} 

Proposition 4 IfVATlT(x T \T,p 1 . . . Pn ) = {P u . . . ,P„}, then Pi = { Pi \ A/<i/»i O Py} and 
PilnEPy] = Qfori£j. 



Proof: (Sketch) By induction on i, using the fact that if the substitution 6 t is defined by 6, (x T ) = p t 
for all 1 < ;<«, then (. . . (x T |T)| ei . . = x T |x t O/?! A . . . A x T Op„. | 

Theorem 2 (Partitioning) Let p\...p n be a pattern such that VATZT(x T \T,pi...p n ) = 
{Pi, ... ,P„}, then 

- Bpi.-.pJr c Ui<i<„P«lr 

- //f G Ui< ( <«l[A]lr, ^cn 3; swcA ?/?af f G Bp/lr ? G ^PkJr, for k <j 



Proof: (Sketch) The first claim is shown by induction on n. The second one is shown by 
Proposition 4 and the second item of Proposition 2. To show the last claim, it suffices to prove that 
both . . .pjir and [[Pi . . .P„JJ^ are equal to PIkkhM^- ■ 



Thus, partitioning transforms an ambiguous list of order-sorted terms (a pattern) into an 
unambiguous set of order-sorted constrained terms. It changes neither the decidable sets 
associated to each pattern nor the strict one. We say that {Si,...,S n } is a complete 
decomposition if and only if there exists a pattern p\...p n such that partitioning x T \T 
according to the list \p\ .. .p n x T ] yields {S\, ... , S n }. When x T is appended to the list of 
patterns, both the success and the failure of the matching are considered. This does not change 
the original problem because the discrimination tree covers all the cases that may appear during 
the pattern matching process. It turns out that when {Si , S n } is a complete decomposition 
and t G [5/]];f, there exists 1 < i < n such that t G Kir- 
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4.3 Normalization 

In order to simplify the construction of the PMT, we use four normalization rules (flattening, 
sort, structure, empty) that operate over sets of disjoint constrained terms. Since the 
formalization of these rules is quite complex, while the underlying idea is rather intuitive, we 
first show three typical examples where such normalizations are to be performed. 

The first rule decomposes a complex structural constraint into several simpler ones with only 
one constructor in the right hand side of each structure atom. Normalizing x T \x T O cons( 1 , z) 
with the flattening rule yields x T \x T O cons(_, _) and cons(y, z) \y O 1. 

Now, consider the constrained term x T \x T Ocons(y, z). Recall that, with a lazy evaluation 
mechanism the sort of a well-sorted expression can be known before its structure. Thus, if x T 
is substituted by a term whose sort is incompatible with that of cons(y, z), it is not necessary to 
reduce the term. On the other hand, if x T is substituted by a list expression, it is necessary to 
continue reduction in order to decide whether it is a "cons" term. Thus, assuming that list and 
int are the only sorts in the system, the sort rule transforms x T \x T Ocons(y, z) into x T \x T : int, 
x T \x T : list A x T Ocons(y, z). 

If a variable has a common occurrence in more than one constrained term and is restricted 
by different structural constraints, then the structure rule can be applied. For example, 
f{a,x int ) | x int O 1 andf(b,x int ) | x int O 2, is transformed into f(a, 2)\T, f(b, \)\T, 
f(a,x int ) | x int O 1 A x int O 2 andf(b,x int ) | x int O 1 A x int O 2. Now, the subterm at 
position 2 is a variable restricted by the same set of symbols or it is just one of such symbols. 

We shall say that t\C is in normalized form, if and only if C is in simplified form but is 
different from T and whenever x<>f(t\ . . . f„) CT appears in C and / : cr\ . . . a n — > a 6 S, 
then x is restricted by a in C and t\...t n are mutually distinct variables of sort g\ . . . a n 
respectively. We shall say that a set of constrained terms {/i|Ci, . . . , t n \C n } is in normalized 
form (irreducible by — >„), if and only if every ti\ Q is in normalized form and, whenever there 
exist two terms tj and tj and there exists a position u such that for all v < pos u, tj and tj have 
the same structure symbol at position v, tj/u and tj/u are variables restricted by the same sort 
and by nonempty sets s ; - and Sj of structure symbols in Q and C, respectively, then 57 = sj. 

Example 4.3 The constrained term /i(x cr ,y*)|v*0/(v cr , w a ) is in normalized form while 

are not. 

Figure 2 presents the normalization rules. There, we assume / : <t\ . . . cr n — > a 6 £ and 
x° l , . . . ,x%" are pairwise distinct variables. When 9 is a substitution, {t\C)((9)) denotes the 
term 9{t) \ 9(c) [ s . When normalizing, rules are applied in the order that they appear in 
Figure 2. They satisfy the properties of Theorem 1; namely, termination, invariance and 
completeness. 

Theorem 3 (Normalization) Let S be a set of constrained terms. 
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Flattening: If 3i (l < i < n) such that f; is not a variable of sort <r,-, then 
SU{t\ xOf(h...t n y AC} — >„ 

SU{t\xOfAC, (t\xOf(h...t n y AC)((x<-f(x 1 ...x n ))) } 
Sort: If x is restricted by rj in C and rj > a, then 

SU{t\ xOf a AC} — > n S U { t | (x : 77 - <r A C) | s , * | (x : <r A xOf 7 A C) |, } 

Structure: If 3u such that fi/w = {x°\x°Os\} and r 2 /w = {/lyO^},/ 0 " G *i, 
but/ 0- ^ S2 and Vv, v < pos m, t\ and ? 2 have the same constructor 
symbol at position v, then 

Su{h\P u t 2 \P 2 } — >„ 

5 U { ^ | P 1} r 2 | (P 2 A y CT 0/ CT ) I,, (f 2 I P 2 )((y CT <— /(jci . ..*„))) } 

Empty: 5 U — >„ 5 

Figure 2: Normalization Rules 
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(Invariance) IfS — >„ S', then IS] = IS' J 

(Termination) There are no infinite chains S — >„ 5' — > n . . . 

( Completeness) IfS is not in normalized form, 35' such that S ( — > s U — >„) 5' 



Proof: (Sketch) The invariance claim is straightforward. Now, suppose S is not in normalized form. 
If there exists a constrained term t \ T , the empty rule can be applied. If there is t\ C such that xOf 7 is 
in C and x is restricted by r] with r] > a, the sort rule can be applied. If there exists xOf(ti . . . f„) CT , 
where / : a\ . . .a„ — > a £ £ and f, not a variable of sort cr,, then the flattening rule can be applied. 
Suppose there exist two terms r,- and tj and there exists a position u such that t t /u = {xf \x?Osi}, 
tj/ u = {y° ly^Osj}, for all v < pos u, h and tj have the same structure symbol at position v, but s ( - ^ Sj. 
If f v G Si or Sj with a > rj, then the sort rule can be applied. Otherwise, the structure rule can be 
applied. To prove the termination claim over a set {fi|Ci, . . . , t„\C„} of constrained terms, note that 
rules are applied in order. Flattening decreases the complexity of right-hand sides of structure atoms 
of C,'s; sort decreases « S(xO/")ec distance(xOf (T , C ( ), where distance(xOf (T , C,-) is defined 

as 0 if x is restricted by a in C„ 1 otherwise; structure decreases — */| + |iy — s,-|, and empty 
decreases the number of constrained terms. | 



5 Sequentiality and Optimality 

Compiling pattern matching consists of transforming a function denned by order-sorted 
patterns into a case-expression presented as a discrimination tree. The tree obtained is not 
always optimal, that is, it could fail to terminate on some terms that are not in the strict set of 
the pattern. As the evaluation mechanism is sequential, we must choose an order of verification 
running the risk of losing some solutions. In a many-sorted framework, consider Berry's 
example [1] formed by the patterns / (true, true, z),f '(false, y, true), f(x, false, false). Given 
a term/(_, _, _), we must choose an argument position in order to start the matching. If we 
start at position three, the termf(true, true, » Bo ° l ) will not be matched, even though it belongs 
to the denotation of the first pattern. The same happens with the terms f (false, % Boo \ true), 
f(» Bo ° l , false, false) if we start at the second or third positions, respectively. 

With a strict evaluation mechanism, an optimal PMT will be faster but the solutions (that is, 
those terms that match or not) will remain the same as that of a non-optimal tree. On the other 
hand, in a lazy evaluation framework, some non-optimal trees may fail to terminate due to 
unnecessary verifications that try to reduce subterms that do not have a head-normal form. 

In our framework, the sort of each term is examined before its structure, because the sort can 
be refined after usually only a few reduction steps whereas to examine the structure, more 
reduction steps are required to obtain head-normal form. The construction method for PMT's 
that we present here chooses a direction (intuitively, a position in a term at which to start 
reduction) and thus decides whether a subsort or structure verification is required. At each 
level of the tree, the structures and sorts are more precise than those of preceding levels. 

We propose a notion of sequentiality of the pattern matching predicate that takes not only the 
structure of terms into account, but also the sort system. Intuitively, a disjoint set 5 of patterns 
is sequential in the sense of [7, 10] if it is possible to decide the matching property without 
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doing some look-ahead. That is, for any constrained term T not matching any pattern of S, 
there exists a position (so-called direction) where a reduction must be performed in order to 
decide the matching property. Furthermore, this position can be determined without looking 
at the subterms of T which are not computed yet. Our definition of sequentiality requires the 
set S of patterns to have also the sort property: intuitively, S has the sort property if whenever 
u is a position of a constrained term T to be evaluated and two different patterns of S that are 
compatible with T have variables y? , y p at position u respectively, then a and p are either 
disjoint or comparable sorts. In fact, if a and p have a common subsort 6 different from _L, a 
and p, the position u cannot be taken as a direction because unnecessary reductions of T may 
be performed in order to distinguish between a and p. 

For example, consider the three unambiguous patterns f(true,y° ,z), f (false, y s , true), 
f (x,y^ , false), where the subsort order is that of Example 3.1. If the PMT associated 
with this problem needs to know whether a term is of sort S (resp. of sort <r) as in the case of 
f (false, true) (resp. f(true, • cr , true)), it will fail to terminate even though the term is in 
the denotation of the second (resp. first) pattern. 

Optimal PMT's will only fail to terminate on the strict set of the problem. It turns out that 
sequentiality of a pattern matching problem is equivalent to optimality of its tree. Thus, 
sequentiality becomes a necessary and sufficient condition for the construction of an optimal 
tree. We shall next give an effective decision procedure for sequentiality on disjoint sets of 
patterns. 

In reading the following section, familiarity with the work of [7] would be helpful, but the 
treatment is self-contained enough to be meaningful on its own. 

5.1 Sequentiality 

The set of positions or occurrences of a constrained term t\C, denoted 0(t\C), is defined as 
the set of positions of t, which is recursively defined as usual as finite sequences of positive 
integers such that e 6 0(t) and k.u G 0(f(t\ . . . t n )) if u G h. We use < pos to denote the 
lexical ordering between positions. The subterm of t at position u, denoted t/u, is defined as 
t/e = t and/(?i . . . t n )/k.u = tk/u. We use (t\C)/u to denote the constrained term (t/u)\D, 
where D is the constraint of all the atoms in C restricting variables of t/u. For example, 
(f(g(x° ,a),y p )\x° : 77 A x a Of A y"Og)/l = g(x (T ,a)\x°' : tj A x a O f. If the replacement of 
the subterm of t at position u by a term p is a well-sorted term, we define t[u <— p] to be that 
term. If T = t\C and P = p\D are two constrained terms and t[u <— p] is a well-sorted term, 
T[u <— P] is defined as t[u <— p]\(C A D\ v ^ u ^ p ^). For example, if g(x a , a) is a term of sort 



true if and only if 3P G S, P C M. If truth values are considered to be ordered by false C true, 
S is a monotonic predicate on constrained terms. Increasing information about the term (in the 
sense of C) can only change the value of the predicate to a favorable one. 




is 
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A position u G 0(t\C) is said to be a direction of T = t\C in a set of disjoint constrained 
patterns 5 = {t\ \ C\ , . . . , t n \ C n } if and only if 

- T/u has the form x a \P 

- For every constrained term M such that T C M and S(m) is true, M/w ^ T/u 

- (Sort property) If 3i,j such that Vv, v < pos u, tj and have the same constructor symbol 
at position v; tt/u is a variable restricted by 77,- in C, and is a variable restricted by rjj 
in Cj, then 77,- n rjj = _L or 77,- < rjj or 77, < 77,-. 

Lemma 1 Let The a constrained term t\ C. A position u is a direction ofT in S = {Si , . . . , S n } 
if and only if T/u has the form xf 7 \P and for every constrained pattern 5,- G S compatible with 
T we have that u is an occurrence of Si, Si/u ^ T/u and the sort property holds. 

Proof: Let u be a direction of T in {S\, ... , S„} and S 1 ,- be a constrained pattern compatible with T. 
Then T / u has the form x CT \P withx CT restricted by cj> in P. Suppose that u ^ 0{Si). Then, there exists 
a position v such that u = v.w, w ^ e and is y p \D. Since 5,- is compatible with T, there exists 
a constrained term M = m\F such that 5, C M, T C M; that is, M/v is an instance of y p \D. Now, 
(M/v)[w <— z*|P] = M[m <— z^\P] is a well-sorted constrained term that is also an instance of 5,- and 
obviously of T. Therefore M/u C T/m by construction, which contradicts the hypothesis. 

Conversely, let M be a constrained term such that T C M and 5, C M and suppose M/m C T/u. 
Then, 5,/ m C M/ uQ T/u, which contradicts our hypothesis. | 

By normalization, a complete decomposition S reduces to another complete decomposition 5' 
and the set of directions of any term T in S is the set of directions of T in S'. 

We say that a constrained term T is compatible with a set of disjoint constrained patterns S if 
and only if there exists M such that T C M and S(m) is true. In particular, if 5 has only an 
element {P}, T is compatible with 5 if and only if T and P are unifiable, i.e., 3M such that 
ICM and PCM. 

A set of disjoint constrained patterns 5 is sequential in a constrained term T if and only 
if, whenever S(t) is false but it is compatible with S, then there exists a direction of T in 
S. We say that 5 is sequential if and only if it is sequential in all constrained terms in 
normalized form. Sequentiality of a predicate S is the possibility of systematically expanding 
any constrained term step by step until either the predicate is true or it is clear that a positive 
answer is impossible. 

The sort property enriches the known notions of sequentiality by taking into account the sort 
system. When a variable's position is restricted by two sorts with nonempty and nontrivial 
intersection, some solutions are lost, as illustrated by the following example. 

Example 5.1 With the subsort order of Example 3.1, the following set of disjoint 
constrained patterns is not sequential: 

{h{p,y T ,z p )\y T h{x",y T ,p)\x"Op A y T : 6, h{x" ,y T ,q)\y T : </> } 



May 1991 



Digital PRL 



Pattern Matching in Order-Sorted Languages 



17 



If pattern matching starts at the first position (resp. at the third), it will fail to terminate on the 
term /i(« p ,/(a, a), q) (resp. h(p, b, even though it belongs to the denotation of the third 
(resp. first) pattern. Now, note that the first and second constrained terms have variables at 
position 2 which are restricted by sorts with nonempty and nontrivial intersection. If matching 
starts at this position asking whether or not a term is of sort a (resp. of sort S), it will fail to 
terminate on h(q, » s ,p) (resp. h(p, ,p)), even though it is in the denotation of the second 
(resp. first) pattern. 

5.2 Construction of pattern matching trees 

A pattern matching tree (PMT) for a constrained term T and a complete decomposition 
5 = {Si, . . . , S„} is defined as: 

- T is the root and each node is a constrained linear pattern in simplified form. 

- If u is the direction of P in S and T\ , . . . , 7\ are the children of P, then: 

- T\ . . . Tic are pairwise incompatible constrained terms; 

- P/u has the form x CT \T, 7} and P only differ on u and P/u C Tj/u; 

- for every T,-, there exists a pattern Sj such that Tj C Sj. 

- If H\ . . . H m are the leaves of the tree, then {Si , S n } — >* {Hi , H m } 

A PMT of a complete decomposition {Si , S n } is a pattern matching tree for the constrained 
term x T \T and {Si, ... , S n }. A PMT of {Si, ... , S„} is optimal if and only if it fails to 
terminate only for the strict set of Si ... S„. 

We now describe an algorithm T1Z££ that constructs a PMT for a constrained term T = t\C 
and a complete decomposition {Si, . . . , S„}. If C is non-consistent, return the empty tree. 
Otherwise, if T is an S,, return the single-node tree T. Otherwise, normalize {Si, . . . , S n } 
into {Hi, . . . ,H m } and search a direction u of T in {Hi, . . . ,H m }. If such a direction 
cannot be found, {Hi , H m } is not sequential in T so fail. Otherwise, proceed with 
Vin(T,{Hi,...,H m },u) where T= t\C; T/u = x a \T; H t = hi\Q and VITZ is defined by: 

VITZ (r, {H u ...,H m },u) = 

Let Sorts be the maximal sorts of { rj \ T C Hi, hj/u 6 V is restricted by rj } and 
let Forms be {fi \ T C Hi, hj/u =fi{. ■ •) } in 
if Sorts = 0 

then (a structure step) build a tree rooted at T with children : 

TTZ££(T[u <-/(...)] i s ,{Hi,...,H m }) for each / in Forms and 

TTZ££(t | (CAx a O Forms) [ s , {H u . . . ,H m })) 
else (a sort step) build a tree rooted at T with children: 

TTZ££(t | (CAx 0- : 77) [ s ,{Hi,... ,H m }) for each 77 £ Sorts and 

TTZ££(t | (CA/ : a - Sorts) [ s , {H h . . . ,H m })) 
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Example 5.2 Let us consider the subsort order of Example 3.1. Let a be the sort a U 8 
and g be the constructor of (f> whose domain sort is a x nat, with nat and a disjoint sorts. 
Consider the pattern g(x a ,y nat ), ^(x^O), g(x s ,l). Normalization yields: 



g{**,0)\**:<l> 
g(x*,l)\x* :(/> 
y T \y T : nat U a 

g{x a , y nat ) \x a : <j> A O 0 A y nat O 1 



Figure 3 shows the PMT. The directions are between square brackets. 




y T \y T :<t> [e] y T \ y T :natUa 



g {x a ,r i ) i r [i] 




g(* .y ) I x : cr 



(x-,/*)!^^ [2] 



s(* a ,o) 




i A y""' O 0 A O 1 



Figure 3: Pattern Matching Tree 

Theorem 4 A finite complete decomposition in normalized form S = {Si, . . . , S n } is sequen- 
tial in every normalized term if and only if it is sequential in every node of its associated 
PMT. 



Proof: Since nodes of the PMT are in normalized form the left to right implication is evident. 
Conversely, let M be a constrained term in normalized form such that S(M) is false and M is 
compatible with S. As x T \T is the root of the pattern matching tree but M does not match any pattern 
of S, there exists a node T = t \ C whose children are T\ . . . T m and whose direction in S is u such that 
TQM and for all 1 < i < m, T t % M. We will prove that u is a direction of M in S, that is (by 
Lemma 1) the sort property holds, (1) M/u has the form x p \P and for all 5, G S compatible with M, 
(2) u is an occurrence of 5, and (3) 5,-/ u % Mj u. 
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By definition of discrimination tree T / u is x" \T and so (1) holds. By construction each level is either 
a sort or a structure step and since u is a direction of T in S the sort property holds. 

In the sort step case, T\/u .. . T m / u are variables x P[ . . . x p ™ and (1) immediately holds. Since TQM 
and nodes at the same level of T are defined to be incompatible, if Sj is a pattern compatible with M, 
then T Q Sj and (2) holds. By construction, for every Sj compatible with T we have Sj/ u Q T/u. In 
particular we have Sj/u Q T/u. Now, if T/u = M/u (3) holds. Otherwise, suppose Sj/u Q Af/n. As 
M is in normalized form and a is not a minimal sort, M/u does not have any structure atom and then 
M/u is of the form x p \T and a > p. Since 5,-/ m C M/ m, 5,-/ u is |T and £, > p. By hypothesis for 
a ll 1 <j< m iTj/uQ an d tnen P/ 2 P- As r C Si, there exists 7} such that 7) C 5, and then 
pj = Then pj > p which contradicts the hypothesis. 

In the structure step case T\/u = f\, . . . , T m _\/u = / m _i and T m /u = x <T \x' T <>{f\ . . ./ m -i}. Since 
r C M and for all 1 < i < m, Tj Q M, T/u = M/u or M/u = x"\x''OS. In the first case, we have 
u is a direction of M in S. In the second case (1) immediately holds. Let Sj be a constrained pattern 
compatible with M. As in the sort step case, T Q 5, and (2) holds too. Now, suppose Sju C M/u. 
Then S,/w is a variable and there must be a child 7} of 7 1 such that 7} C 5,. We have 7}/w is also a 
variable and thus necessarily j = m and {/i . . ./ m _i} C 5. By construction, T and r, only differ on u 
and r C M. Therefore 7) C M which contradicts the hypothesis and thus (3) holds. | 

Theorem 5 A PMT of a complete decomposition S in normalized form is optimal iff S is 
sequential. 

Proof: By Theorem 4, {Si, . . ., S„} is sequential if and only if there exists a discrimination tree in 
which each node t\C has a direction in the set {S\, S„} and the sort property holds. The set of 
terms for which the algorithm does not terminate is generated at each node t \C of the PMT by some 
terms of the form (f| C)[m <— where u is the chosen direction of f|Cin {Si, . . . , S„}. By definition 
{S\, ... , S„} is optimal if and only if it fails to terminate only for the strict set of {S\, . .. , S n }. We 
must verify that the algorithm fails to terminate in (f| C)[m <— » p ] if and only if it is in the strict set 
of {Si, . . . , S„}. The right to left implication is evident. Conversely, by construction each level is 
either a sort or a structure step. If T\, . . . , T m are the children of the node t\C, two cases are to be 
considered: 

In the structure level case, T t = (t\C)[u i = 1 . . .m- 1 and T m = t | (CAx°0{f?, . . . ,/^_J). 

By normalization T/u is a variable restricted by r] and then p = r\. Thus, (?|C)[m <— » v ] is in the 
strict set of each T, and then in the strict set of each leaf Sj such that T, C Sj. Then (f| C)[m <— » v ] is 
in the strict set of {S\ , ... , S„}. 

In the sort level case, T/u has the formx CT \T and by construction each Tj is of the form t \ [CAx" : 77,) 
with 77, < a. Since the decomposition is complete, |_| (=1 m ij i = a and by the sort property rji . . . r\ m 
are pairwise disjoint sorts. If the algorithm fails to terminate in (f| C)[m <— » p ], there exists at least 
one ; such that p rji and p n 77, ^ _L. We have (f| C)[m <— •''] in the strict set of T,. Now, for all Sj 
such that Tj C Sj, we have either Tj/ u = Sj/ u or Sj/ u = f 11 or Sj/ u = x Vi ^^{f", g Vi , .. .} and then 
{t\C)[u <- » p ] is in the strict set of Sj. Then {t\C)[u <- » p ] is in the strict set of {S u . . . , S„}. | 

6 Discussion 

Unitary signatures have been denned in Section 2 to be regular and to verify some constraints 
over the set of sort symbols and the set of function and constructor declarations. Since 
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complements of sorts are often used during the compilation scheme we require the lattice of 
sorts to be boolean. On the other hand, regularity is a sufficient condition for signatures to 
be finitary unifying. Nevertheless, it is not very clear why we restrict our interest to unitarity 
ones. We present two simple transformation rules acting on signatures. A signature obtained 
through these rules is constructed to verify the minimal codomain sort and the disjoint domain 
sort properties. The transformation preserves the set of well-sorted ground constructor terms 
(i.e. the free order-sorted term algebra) and so, we can think of our compilation scheme as one 
that not only transforms patterns, but also signatures. We show in what follows how to obtain 
this "compiled" version. 

First, let us justify the need for constructors to verify the hypothesis of minimal codomain sort 
(the third condition of unitarity). Let E = (S ,<, J 7 ,C ,V ,V) be a. regular signature which 
does not satisfy this condition and (S, <) be a boolean lattice. Then, there exists a constructor 
/ G C such that/ : 77 — > a G V and a is not a minimal sort. The downward signature E' 
obtained from E is 

(5 U {A} U {A c }, < U {a > A} U {A c > r, \ V e r}, T, C, V, (V - {f : 77 -> a}) U {f : 77 -> A}) 

where A and A c are new sort symbols and r is the set of maximal sorts of{77|77nA = _l_}. 

Note that a is strictly greater than the new symbol A which is now a minimal sort. Intuitively, 
E' has the same structure, but constructors are in a "lower level" of the lattice. When the 
number of non-minimal codomain constructors is finite, this transformation terminates and the 
same set of ground constructor terms can be built in E'. The new partial order set of sorts is 
also a boolean lattice. 

Now, suppose the minimal codomain sort condition is verified whereas the disjoint domain 
sort is not. Let E = (S, <,F,C,V,V) and / a constructor with two different declarations 
f : <j\...<T n — > (j and/ : 771 . . . T) m — > 77. If n = m = 0, then/ :— > a and / :— > 77 implies 
a = rj, because a and rj are minimal and whenever E is regular they must be comparable. 
Then n = m > 1 and a, 77 are not disjoint sorts. In this case there exists a term (not necessarily 
a ground term) having 77 and a as sorts. Since E is regular and 77, a are minimal we also have 
(j = 77. The disjoint domain signature obtained from E is E' = (S, <,J r ,C,V, V), where 
the new set of declarations is defined in this way: 

- If a < fj,f : a — > a is redundant and we can remove it. V is V - {f : a — > a} 

- Otherwise, 

- I a — {j 6 [l...n] I *i - rji t ^} 

- Vi G Vj G [1 . . . n], £j is <r,- - 77, if i = j, 07 otherwise 

- l v = {i G [1 . . .n] I rji - 07 ^ _L} 

- V7 G If), Vj G [1 . . . n], is rji - <r ; - if i = j, rjj otherwise 

- X> tJ = {f:#...&-<r| 

- V is V - {f : ff -» (j,/ : a -> <t} U {/ : 77 n -» <t} U D ct U D,, 
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This transformation always terminates and the obtained declarations have incompatible 
domains by definition of "— " (set difference). Even now, we can build the same set of ground 
constructor symbols. 

Comon [2] noticed that an order-sorted signature E is a finite bottom-up tree automaton where 
the set of final states is the set of sorts S. It turns out that the set of well-sorted terms of sort 
(j is the set of trees recognized by a tree automaton at the final state corresponding to a. The 
transformations of signatures we have proposed above are simply transformations of their tree 
automata. When restricting the set of final states of the new tree automaton E' to that of E, 
the same set of well-sorted terms is recognized. 

Unsorted and many-sorted signatures are particular cases of unitary ones, and therefore our 
work also remains applicable to them. There are two other interesting order-sorted type 
systems to be considered. The work of Ai't-Kaci and Smolka [13] has shown that features 
types and constructor types are dual concepts. In this kind of system every constructor symbol 
has exactly one declaration and is a constructor of a minimal sort. In addition, the set of 
feature terms is a prelattice, provided the sort symbols are ordered as a lattice. On the other 
hand, Smolka [12] proposes a discipline with polymorphic order-sorted types restricted to free 
constructors. Specification of the inclusion order between types is defined via special classes 
of terminating rewriting systems and no function symbol contains more that one declaration. 
He shows that the set of sort terms equipped with the order specified by the rewriting rules is 
a well-founded quasi-lattice having _L as its least element. Reasonable algorithms to compute 
the greatest common subsort and least common subsort of two sort terms are given. Our 
order-sorted framework also allows us to accommodate pattern matching in languages with 
such a type system. 

7 Conclusion 

The method of treating ambiguous linear order-sorted pattern matching presented in this paper 
generalizes previous work on non-ambiguous linear patterns [7], ambiguous linear patterns [9] 
and ambiguous linear patterns using constrained terms [10]. We extend several notions 
introduced in [10], such as constrained terms, non-reducible • terms, strict sets of patterns, 
sequentiality and pattern-matching trees, to the order-sorted case. We define discrimination 
trees to have not only edges labeled with structure constraints, but also with subsort restrictions. 
This feature allows to decide pattern matching without reducing terms to normal forms, taking 
advantage in this way of the lazy evaluation strategy. It turns out that our method constructs 
optimal order-sorted PMT's for sequential order-sorted pattern matching problems and can be 
used either with a lazy or strict evaluation strategy. As in [10], our method can also be used 
for non-sequential problems. 

Our general order-sorted framework accommodates lazy pattern matching on all the regular 
systems described in Section 6. Compilation of non-linear and higher-order patterns remains 
as further research work. 
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